Reconfiguration and Transient Recovery in State Machine Architectures
نویسنده
چکیده
We consider an architecture for ultra-dependable operation based on synchronized state machine replica-tion, extended to provide transient recovery and recon-guration in the presence of arbitrary faults. The architecture allows processors suspected of being faulty to be placed on \probation." Processors in this status cannot disrupt other processors, but those that are nonfaulty or recovering from transient faults are able to remain synchronized with the other processors and with each other, can participate in interactively consistent exchange of data (i.e., Byzan-tine agreement), and can restore damaged state data by loading majority-voted copies from other processors. The processors that are not on probation are able to coordinate membership of their group and to take processors on and oo probation. These properties are achieved even if all the processors on probation and some of the others exhibit Byzantine faults, provided a majority of all processors are nonfaulty. Key elements of the architecture are modiied treatments for the problems of interactive consistency, clock synchronization, and group membership. Classical algorithms for these problems that tolerate t Byzantine faults among n processors are extended to tolerate t +p faults among n + p processors, partitioned into n \core members" and p \probationers," provided no more than t faults occur among the core members.
منابع مشابه
Error Recovery Mechanism using Dynamic Partial Reconfiguration
In this paper an error recovery mechanism for SRAM based FPGA systems is presented. Previous recovery methods employ processor cores as a reconfiguration controller consuming notable amount of device resources and introducing additional error detection and recovery latency. The described mechanism is controlled by a finite state machine architecture providing small hardware overhead and short r...
متن کاملReconnguration and Transient Recovery in State-machine Architectures
We consider an architecture for ultra-dependable operation based on synchronized state machine replication, extended to provide transient recovery and reconnguration in the presence of Byzantine faults. The architecture allows processors suspected of being faulty to be placed on \probation." Processors in this status cannot disrupt other processors, but those that are nonfaulty or recovering fr...
متن کاملTwo-Dimensional Sequential Array Architectures: Design for Testability and Reconfiguration Issues
New Design for Testability techniques aimed both at overcoming the problem of testing array architectures composed of sequential cells and guaranteeing fault tolerance through reconfiguration are proposed. Two strategies have been envisioned: (1) structural DfT techniques whose goal is to modify the interconnecting network embedding each cell, and (2) functional techniques aimed at defining a t...
متن کاملPower System Transient Stability Analysis Based on the Development and Evaluation Methods
A novel method to compute the stability region in power system transient stability analysis is presented. This method is based on the set analysis. The key to this method is to construct the Hamilton-Jacobi-Isaacs (HJI) partial differential equation (PDE) of a nonlinear system, using which we can compute the backward reachable set by applying the level set methods. The backward reachable set of...
متن کاملFault tolerant system with imperfect coverage, reboot and server vacation
This study is concerned with the performance modeling of a fault tolerant system consisting of operating units supported by a combination of warm and cold spares. The on-line as well as warm standby units are subject to failures and are send for the repair to a repair facility having single repairman which is prone to failure. If the failed unit is not detected, the system enters into an unsafe...
متن کامل